MotifPrototyper: a Bayesian profile model for motif families.
نویسندگان
چکیده
In this article, we address the problem of modeling generic features of structurally but not textually related DNA motifs, that is, motifs whose consensus sequences are entirely different but nevertheless share "metasequence features" reflecting similarities in the DNA-binding domains of their associated protein recognizers. We present MotifPrototyper, a profile Bayesian model that can capture structural properties typical of particular families of motifs. Each family corresponds to transcription regulatory proteins with similar types of structural signatures in their DNA-binding domains. We show how to train MotifPrototypers from biologically identified motifs categorized according to the TRANSFAC categorization of transcription factors and present empirical results of motif classification, motif parameter estimation, and de novo motif detection by using the learned profile models.
منابع مشابه
Bayesian Inference for Spatial Beta Generalized Linear Mixed Models
In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملSimilarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test
Transcription factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. We propose to identify and group similar pro...
متن کاملBayesian Clustering of Transcription Factor Binding Motifs
Genes are often regulated in living cells by proteins called transcription factors (TFs) that bind directly to short segments of DNA in close proximity to specific genes. These binding sites have a conserved nucleotide appearance, which is called a motif. Several recent studies of transcriptional regulation require the reduction of a large collection of motifs into clusters based on the similar...
متن کاملMachine Learning Approaches to Biological Sequence and Phenotype
Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis Renqiang Min Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2010 To understand biology at a system level, I presented novel machine learning algorithms to reveal the underlying mechanisms of how genes and their products function in different biological levels in this thesis. Specif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 101 29 شماره
صفحات -
تاریخ انتشار 2004